Skip to content

Merge master into feature/host-network-device-ordering #6486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 27, 2025

Conversation

changlei-li
Copy link
Contributor

No description provided.

snwoods and others added 29 commits May 20, 2025 11:38
Migration spawns 2 operations which depend on each other so we need to
ensure there is always space for both of them to prevent a deadlock.
Adding VM_receive_memory to a new queue ensures that there will always
be a worker for the receive operation so the paired send will never be
blocked.

Signed-off-by: Steven Woods <steven.woods@cloud.com>
We've seen that using the policy can be up to 10% faster than using any is some
workflows, while not observing workflows that were negatively affected. The
policy per VM can always be change if need be.

Note that currently sometime the best-effort falls back to the same behaviour,
especially when restarting on starting more than one VM at a time. This needs
xen patches to be fixed:
https://lore.kernel.org/xen-devel/20250314172502.53498-1-alejandro.vallejo@cloud.com/T/#ma1246e352ea3cce71c7ddc26d1329a368548b3b2

Now the deprecated numa-placement configuration option for xenopsd does nothing.
It was exclusively used to enable Best_effort, since now it's the default,
there's no point in setting the option. It's value depends on whether the
default option is best_effort or not, as per the spec.

Signed-off-by: Pau Ruiz Safont <pau.ruizsafont@cloud.com>
This change introduces a new `repository_domain__blocklist`
that lists repo URL patterns to be blocked.
On XAPI startup, any exsiting pool repository whose URLs matches an
entry in this blocklist will be automatically removed. This ensures
that, for example, when upgrading from XS8 to XS9, any XS8 repos are
purged.

Additionally, repository creating now check the same blocklist and
rejects any attempt to add a blocked repo.

- On startup: read blocklist, delete matching blocked repos
- On repository creation: validate against blocklist and abort if matched

Signed-off-by: Stephen Cheng <stephen.cheng@cloud.com>
…g. (#6475)

This change introduces a new `repository_domain_name_blocklist` that
lists repo URL patterns to be blocked.
On XAPI startup, any exsiting pool repository whose URLs matches an
entry in this blocklist will be automatically removed. This ensures
that, for example, when upgrading from XS8 to XS9, any XS8 repos are
purged.

Additionally, repository creating now check the same blocklist and
rejects any attempt to add a blocked repo.

- On startup: read blocklist, delete matching blocked repos
- On repository creation: validate against blocklist and abort if
matched

Tests:
- Create repo with the blocklist configured

![image](https://github.com/user-attachments/assets/8c77b76e-27ef-4184-b5aa-c68b6ee7b9c4)
- Create repo without the blocklist configured

![image](https://github.com/user-attachments/assets/89cebfb8-431d-4690-a667-9ecad6730ba8)
- With the blocklist, restart xapi and the repo was removed
`[root@eu1-dt013 yum.repos.d]# xe repository-list`
We've seen that using the policy can be up to 10% faster than using any
is some
workflows, while not observing workflows that were negatively affected.
The
policy per VM can always be change if need be.

Note that currently sometime the best-effort falls back to the same
behaviour,
especially when restarting on starting more than one VM at a time. This
needs
xen patches to be fixed:

https://lore.kernel.org/xen-devel/20250314172502.53498-1-alejandro.vallejo@cloud.com/

Also fix the legacy numa-placement configuration option for xenopsd. It
was
always deciding the setting, even when not used, not it only takes
effect when
it's present, otherwise it leaves the default option untouched.
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
These two functions are the new SMAPIv3 functions that will enable
mirror and query of the mirror status. So implement them in
xapi-storage-script. The SMAPIv1 counterparts remain unimplemented.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
The similar VDI functionality is uncurrently unused for SMAPIv3
migration so just add a dummy implementation.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
This is the main commit that implements the MIRROR interface in
storage_smapiv3_migrate. The exact detail of how SMAPIv3 mirror is done
is left in the SXM documentation, but core of it is to provide all the
necessary infrastructure to able to call the `Data.mirror` SMAPIv3 call
that will mirror a VDI to another.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
dummy_vdi and parent_vdi are not created by
storage_smapiv3_migrate.receive_start2, so do not attempt to destroy
them in storage_smapiv3_migrate.receive_cancel2.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
This is to mimic the behaviour on SMAPIv1. The update_snapshot_info
function that runs at the end of migration will check for content_id,
and this is needed to make it happy.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Whilst it is not the default behaviour on XS 8 to attach a VDI through
NBD, SXM inbound into a SMAPIv1 SR needs to have nbd enabled for
mirroring purposes. As tapdisk will return usable nbd parameters to
xapi, they can be included in the return value of attach. Most current
users of this return value will keep using blktap2 kernel device and
this nbd information is only used during SXM.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
As this nbd proxy is used for importing data, call it `import_nbd_proxy`
to distinguish with the `export_nbd_proxy` that will be introduced later
on.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
This is a bit of a layering violation as storage_mux should not care
about the version of SMAPI the SR is, nor should it be responsible for
calling hook functions. But as there is no way for xapi-storage-script
to invoke code in xapi (which would also be a layering violation if it
was possible), and smapiv1_wrapper has special state tracking logic for
determining whether the hook should be called. Leave the hook here for
now.

Note the pre_deactivate_hook is not called as currently that remains a
noop for SMAPIv3. And as we do not support VM shutdown during outbound
SXM for SMAPIv3 anyway, leave a hack in the storage_mux for now until we
have a plan on how to support that.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
The attach and activate of the VDI being live migrated is there so that
the SXM can keep working even if the VM on which the VDI is activated
shutsdown. This is possible on SMAPIv1 as tapdisk does not distinguish
between different domain paramters. But that is not the case for
SMAPIv3.

For now just avoid activating the VDI on dom0 since the VM is already
activated on the live_vm. This does mean that SXM will stop working if
the VM is shut down during storage migration. We will leave that case in
the future.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
There is a mirror_checker/tapdisk_watchdog for SMAPIv1 that periodically
checks the status of the mirror and sends an update if it detects a
failure.

Implement something similar for SMAPIv3 mirror, although this
check happens for a shorter period of time compared to the SMAPIv1
tapdisk_watchdog because the `Data.stat` call will stop working once the
VM is paused, and currently we have no easy way to terminate this mirror
checker just before the VM is paused (in xenopsd). So only do this check
whilst the mirror syncing is in progress, i.e. when we are copying over
the existing disk content.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Previously the tapdisk watchdog in SMAPIv1 mirroring was cancelled in
the `post_deactivate_hook`, but at that point the VDI has already been
deactivated, and hence the mirror would have been terminated.
Additionally, the last time the stats is retrieved is in
`pre_deactivate_hook`, so do this cancelling after the last stats
retrival.

Note that SMAPIv3 mirror does not have a watchdog due to the limitations
of the mirror job auto cancel after guest pause, so instead the mirror
checking is only done whilst the mirror syncing (i.e. copying existing
disk content) is in progress.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
This is a continuation of #6439 in the effort of implementing outbound
SXM for SMAPIv3. We have reached the climatic point of this and can
actually now implement the logic to do outbound SXM for SMAPIv3 SRs!

This is a rather large PR and I expect it to take some time to be
reviewed and merged, so I am opening this early to gather some feedback.
Since #6439 is not yet merged and this PR depends on that one, I am
marking this one as a draft, while reviewing please ignore the first
three commit in this PR and look at #6439 first instead. I will update
this one again when #6439 is merged

There is also a couple of docs PR at the end documenting the design and
approach taken in doing SMAPIv3 migration.

In terms of testing plan, the important thing is to make sure for now
that this is not regressing the SMAPIv1 migration. For that I will be
using the SXM functional tests suite. I will also be using more tests to
actually test the SMAPIv3 SXM feature.
In commit 2eff6ab,
the http handler was renamed to add an "import" in the url, but we need
to keep the previous one for backwards compatability. This is so that
previous versions of sparse_dd in XS 8 can migrate to the latest one.

Signed-off-by: Vincent Liu <shuntian.liu2@cloud.com>
RRD loop is executed each 5 seconds. It delays fixed 5 seconds between each
loop. But the loop self also consumes time (The time consuming depends on CPU's
count. If there are many CPUs, the time consuming may be hundreds milliseconds).
This implementation leads RRD will take an offset after several loops. Then one
of RRD data lose and a gap can be observed on XenCenter performance graph.

The solution is to use a fixed deadline as each iteration start time and to use
a computed delay (timeslice - loop time consuming) instead of fixed delay.

Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
RRD loop is executed each 5 seconds. It delays fixed 5 seconds between
each loop. But the loop self also consumes time (The time consuming
depends on CPU's count. If there are many CPUs, the time consuming may
be hundreds milliseconds). This implementation leads RRD will take an
offset after several loops. Then one of RRD data lose and a gap can be
observed on XenCenter performance graph.

The solution is to use computed delay (timeslice - loop time consuming)
instead of fixed delay.
When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will call
`VM.assert_can_migrate` to check each host in each pool connected to XenCenter
if the VM can be migrated to it. The API `VM.assert_can_migrate` then calls
`VM.export_metadata`. `VM.export_metadata` will lock VM. During this time, other
`VM.export_metadata` requests will fail as they can't get VM lock.

The solution is to add retry when failing to lock VM.

Signed-off-by: Bengang Yuan <bengang.yuan@cloud.com>
…#6470)

Migration spawns 2 operations which depend on each other so we need to
ensure there is always space for both of them to prevent a deadlock
during localhost and two-way migrations. Adding VM_receive_memory to a
new queue ensures that there will always be a worker for the receive
operation so the paired send will never be blocked.

This will increase the total number of workers by worker-pool-size.
Unlike parallel_queues workers, these workers will be doing actual work
(VM_receive_memory), which could in theory increase the workload of a
host if it is receiving VMs at the same time as other work, so this
needs to be considered before merging this PR.
Sorry folks, looks like we need an epilogue of #6457 forget about this
backwards compatability issue. Backwards compatability is hard...
When the customers open "Migrate VM Wizard" on XenCenter, XenCenter will
call
`VM.assert_can_migrate` to check each host in each pool connected to
XenCenter
if the VM can be migrated to it. The API `VM.assert_can_migrate` then
calls
`VM.export_metadata`. `VM.export_metadata` will lock VM. During this
time, other
`VM.export_metadata` requests will fail as they can't get VM lock.

The solution is to add retry when failing to lock VM.
@changlei-li changlei-li merged commit c3ed9ca into feature/host-network-device-ordering May 27, 2025
122 of 125 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants